An Unsupervised Method for Canonicalization of Japanese Postpositions

نویسنده

  • Kentaro Torisawa
چکیده

We present an unsupervised method for canonicalizing joshi (postpositions) in Japanese. Some postpositions in Japanese do not specify semantic roles explicitly as case markers do, although those postpositions syntactically behave as the case markers. Such postpositions includes “wa,” which topicalizes noun phrases, and “mo,” which emphasizes noun phrases. For this paper, we replaced these postpositions in a sentence with case markers, without changing the meanings of the original sentence as little as possible. This leads to canonicalization or paraphrasing of verb phrases into canonical forms with desirable properties. Our method utilized case frames and semantic word classifications induced by the Expectation Maximization algorithm. The induction process was unsupervised in the sense that no semantic clues were given before the induction of the case frames and the word classifications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Locative Postpositions and Conceptual Structure in Japanese

This paper proposes two syntax-semantics correspondence rules which consistently account for the distribution of Japanese locative postpositions ni and de. We demonstrate how to adapt the machinery of the occurrence of the postpositions based on the assumption of Conceptual Semantics (Jackendoff, 1983; 1990; 1991) to fit the organization of Japanese grammar. The correspondence rules correlate w...

متن کامل

Extracting and Classifying Urdu Multiword Expressions

This paper describes a method for automatically extracting and classifying multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus (around 8.12 million tokens). The MWEs are extracted by an unsupervised method and classified into two distinct classes, namely locations and person names. The classification is based on simple heuristics that take the co-occurren...

متن کامل

Designing multiple distinctive phonetic feature extractors for canonicalization by using clustering technique

Acoustic models of an HMM-based classifier include various types of hidden factors such as speaker-specific characteristics and acoustic environments. If there exist a canonicalization process that represses the decrease of differences in acoustic-likelihood among categories resulted from hidden factors, a robust ASR system can be realized. We have previously proposed the canonicalization proce...

متن کامل

BotOnus: an online unsupervised method for Botnet detection

Botnets are recognized as one of the most dangerous threats to the Internet infrastructure. They are used for malicious activities such as launching distributed denial of service attacks, sending spam, and leaking personal information. Existing botnet detection methods produce a number of good ideas, but they are far from complete yet, since most of them cannot detect botnets in an early stage ...

متن کامل

An Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network

RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001